Clustering with Partial Information
نویسندگان
چکیده
The Correlation Clustering problem, also known as the Cluster Editing problem, seeks to edit a given graph by adding and deleting edges to obtain a collection of vertex-disjoint cliques, such that the editing cost is minimized. The Edge Clique Partitioning problem seeks to partition the edges of a given graph into edge-disjoint cliques, such that the number of cliques is minimized. Both problems are known to be NP-hard, and they have been previously studied with respect to approximation and fixed parameter tractability. In this paper we study these two problems in a more general setting that we term fuzzy graphs, where the input graphs may have missing information, meaning that whether or not there is an edge between some pairs of vertices of the input graph can be undecided. For fuzzy graphs the Correlation Clustering and Edge Clique Partitioning problems have previously been studied only with respect to approximation. Here we give parameterized algorithms based on kernelization for both problems. We prove that the Correlation Clustering problem is fixed-parameter tractable on fuzzy graphs when parameterized by (k, r), where k is the editing cost and r is the minimum number of vertices required to cover the undecided edges. In particular we show that it has a polynomial-time reduction to a problem kernel on O(k + r) vertices. We provide an analogous result for the Edge Clique Partitioning problem on fuzzy graphs. Using (k, r) as parameters, where k bounds the size of the partition, and r is the minimum number of vertices required to cover the undecided edges, we describe a polynomial-time kernelization to a problem kernel on O(k · 3) vertices. This implies fixed-parameter tractability for this parameterization. Furthermore we also show that parameterizing only by the number of cliques k, is not enough to obtain fixed-parameter tractability. The problem remains, in fact, NP-hard for each fixed k > 2.
منابع مشابه
Mechanisms of Partial Supervision in Rough Clustering Approaches
We bring two rough-set-based clustering algorithms into the framework of partially supervised clustering. A mechanism of partial supervision relying on either qualitative or quantitative information about membershipsofpatterns to clusters is envisioned.Allowing suchknowledgebased hints to play an active role in the clustering process has proved to be highly beneficial, according to our empirica...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملClustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information
Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...
متن کاملInvestigation through and Clustering the Information Needs and Information Seeking Behavior of Seminary and University Students of Khorasan-e- Razavi with Neural Network Analysis
Background and Aim: This study aims to investigate and clustering the information needs and information seeking behavior of seminary and university students using neural network analysis in Khorasan-e- Razavi. Methods: The quantitative study is an applied and descriptive survey conducted with neural networks analysis. Data were collected by a questionnaire based on the information needs and inf...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کامل